# Hands on - Surfing Your Data using Azure SDK for Python

__Notebook Version:__ 1.0<br>
__Python Version:__ Python 3.8 - AzureML<br>
__Required Packages:__ No<br>
__Platforms Supported:__  Azure Machine Learning Notebooks
     
__Data Source Required:__ Log Analytics tables 
    
### Description
This notebook will provide step-by-step instructions and sample code to guide you through Azure authentication, Microsoft Sentinel log data discovery by using Azure SDK for Python and Kusto Query Language (KQL).<br>
*** No need to download and install any other Python modules. ***<br>
*** Please run the cells sequentially to avoid errors. *** <br>
Need to know more about KQL? [Getting started with Kusto Query Language](https://docs.microsoft.com/azure/data-explorer/kusto/concepts/).

## Table of Contents
1. Warm-up
2. Azure Authentication
3. Log Analytics Data Queries

## 1. Warm-up

In [None]:
# If you need to know what Python modules are available, you may run this:
# help("modules")
!pip install azure-monitor-query

In [None]:
# Load Python libraries that will be used in this notebook
from azure.identity import AzureCliCredential, DefaultAzureCredential
from azure.monitor.query import LogsQueryClient, MetricsQueryClient, LogsQueryStatus
from azure.mgmt.loganalytics import LogAnalyticsManagementClient

from datetime import datetime, timezone
from IPython.display import display, HTML, Markdown
import pandas as pd
import json
import ipywidgets
import matplotlib.pyplot as plt

In [None]:
# Functions will be used in this notebook
def read_config_values(file_path):
    "This loads pre-generated parameters for Microsoft Sentinel Workspace"
    with open(file_path) as json_file:
        if json_file:
            json_config = json.load(json_file)
            return (json_config["tenant_id"],
                    json_config["subscription_id"],
                    json_config["resource_group"],
                    json_config["workspace_id"],
                    json_config["workspace_name"],
                    json_config["user_alias"],
                    json_config["user_object_id"])
    return None

def has_valid_token():
    "Check to see if there is a valid AAD token"
    try:
        error = "ERROR: Please run 'az login' to setup account."
        expired = "ERROR: AADSTS70043: The refresh token has expired or is invalid"
        validator = !az account get-access-token
        
        if any(expired in item for item in validator.get_list()):
            return '**The refresh token has expired. <br> Please continue your login process. Then: <br> 1. If you plan to run multiple notebooks on the same compute instance today, you may restart the compute instance by clicking "Compute" on left menu, then select the instance, clicking "Restart"; <br> 2. Otherwise, you may just restart the kernel from top menu. <br> Finally, close and re-load the notebook, then re-run cells one by one from the top.**'
        elif any(error in item for item in validator.get_list()):
            return "Please run 'az login' to setup account"
        else:
            return None
    except:
        return "Please login"


In [None]:
# Calling the above function to populate Microsoft Sentinel workspace parameters
# The file, config.json, was generated by the system, however, you may modify the values, or manually set the variables
tenant_id, subscription_id, resource_group, workspace_id, workspace_name, user_alias, user_object_id = read_config_values('config.json');

## 2. Azure Authentication

In [None]:
# Azure CLI is used to get device code to login into Azure, you need to copy the code and open the DeviceLogin site.
# You may add [--tenant $tenant_id] to the command
if has_valid_token() != None:
    message = '**The refresh token has expired. <br> Please continue your login process. Then: <br> 1. If you plan to run multiple notebooks on the same compute instance today, you may restart the compute instance by clicking "Compute" on left menu, then select the instance, clicking "Restart"; <br> 2. Otherwise, you may just restart the kernel from top menu. <br> Finally, close and re-load the notebook, then re-run cells one by one from the top.**'
    display(Markdown(message))
    !echo -e '\e[42m'
    !az login --tenant $tenant_id --use-device-code

# Initialzie Azure LogAnalyticsDataClient, which is used to access Microsoft Sentinel log data in Azure Log Analytics.  
# You may need to change resource_uri for various cloud environments.
resource_uri = "https://api.loganalytics.io"
la_client = LogAnalyticsManagementClient(AzureCliCredential(), subscription_id = subscription_id)
credential = DefaultAzureCredential()
la_data_client = LogsQueryClient(credential)

## 3. Log Analytics Data Queries

In [None]:
# Get all tables available using Kusto query language.  If you need to know more about KQL, please check out the link provided at the introductory section.
start_time=datetime(2022, 1, 1, tzinfo=timezone.utc)
end_time=datetime(2022, 12, 31, tzinfo=timezone.utc)
all_tables_query = "union withsource = SentinelTableName * | distinct SentinelTableName | sort by SentinelTableName asc"
tables_result = la_data_client.query_workspace(
        workspace_id=workspace_id,
        query=all_tables_query,
        timespan=(start_time, end_time))

if tables_result.status == LogsQueryStatus.SUCCESS:
    df = pd.DataFrame(data=tables_result.tables[0].rows, columns=tables_result.tables[0].columns)
    table_list =  list(df["SentinelTableName"])
    table_dropdown = ipywidgets.Dropdown(options=table_list, description='Tables:')
    display(table_dropdown)

In [None]:
# You may query the table based on your needs, here I use TimeGenerated column as an example, going back to 7 days, counting events per day
# Then process the data and display the result
# To look at the query, you may run: print(sample_query)
columns_result = None
column_list = None
all_columns_query = "{0} | getschema | project ColumnName | order by ColumnName asc".format(table_dropdown.value)
columns_result = la_data_client.query_workspace(
        workspace_id=workspace_id,
        query=all_columns_query,
        timespan=(start_time, end_time))

if columns_result.status == LogsQueryStatus.SUCCESS:
    df = pd.DataFrame(data=columns_result.tables[0].rows, columns=columns_result.tables[0].columns)
    col_list =  list(df["ColumnName"])
    column_dropdown = ipywidgets.Dropdown(options=col_list, description='Columns:')
    display(column_dropdown)
else:
    column_list= []